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*»-* ' Abstract. Concept of exponential family is generalized by simple and general 

exponential form. Simple and general potential are introduced. Maximum Entropy 
and Maximum Likelihood tasks are denned. ML task on the simple exponential 
form and ME task on the simple potentials are proved to be complementary in set- 
up and identical in solutions. ML task on the general exponential form and ME task 
on the general potentials are weakly complementary, leading to the same necessary 
conditions. A hypothesis about complementarity of ML and MiniMax Entropy tasks 
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and identity of their solutions, brought up by a special case analytical as well as 
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several numerical investigations, is suggested in this case. 

MiniMax Ent can be viewed as a generalization of MaxEnt for parametric linear 
inverse problems, and its complementarity with ML as yet another argument in 
favor of Shannon's entropy criterion. 
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1. Introduction 



A relationship between Maximum Likelihood (ML) and Maximum En- 
tropy (ME, MaxEnt) methods has been noted and investigated many 
times. Yet it seems to be intricate and puzzling. Jaynes,(Jaynes, 1982), 
is worth long quoting on the subject 

..., any MaxEnt solution also defines a particular model for which 
the predictive distribution using the ML estimates of the parame- 
ters, is identical with the MaxEnt distribution. This is essentially 
the Pitman-Koopman theorem used backwards; given any data the 
MaxEnt distribution having exponential form, in effect creates a 
model for which those data would have been sufficient statistics. 
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This can give one deeper understanding of the terms 'information' 
and 'sufficiency' in statistics, but only after some deep thought. As 
a result, almost every conceivable opinion about the relationship 
between MaxEnt and ML can be found expressed in the current 
literature. 

Some of the opinions (with different level of generality) can be found 
at (Kullback, 1968), (Barndorff-Nielsen, 1978), (Dutta, 1966), (Golan, 
1998), (Campbell, 1970), (Nishii, 1989), (Mohammad-Djafari and Idier, 
1991), (Mohammad-Djafari, 1998). Adding to it other views on MaxEnt 
itself (like interpreting Shannon's entropy function as minus expected 
log-likelihood, or restrictive interpretation of the MaxEnt recovered dis- 
tribution as Maxwell-Boltzmann special member of exponential family, 
or insisting on non-solvability of Jaynes' die problem by ML method) 
makes investigation of relationship between MaxEnt and ML adven- 
turous. 

In the present article we make a clear distinction between opera- 
tional mode of MaxEnt and ML methods, by defining MaxEnt task (as 
a simple instance of MaxEnt method) and also ML task. An analogy 
between Boltzmann's deduction of equilibrium distribution of an ideal 
gas in an external potential field and probability distribution leads 
us to extending exponential family into general exponential form, and 
introducing a notion of simple potential and general potential. Concept 
of complementarity is introduced, and complementarity of ME task on 
simple potential and ML task on simple exponential form is proved. 
Finally, a hypothesis about complementarity of MiniMaxEnt task on 
general potential and ML task on general exponential form, suggested 
by a simple case analytical as well as several numerical calculations, is 
put forward. The results instantaneously extends to Relative Entropy 
Maximization (REM) //-divergence minimization. 



2. DEFINITIONS AND NOTATION 

The notion of exponential family is extended into simple and general 
exponential forms. 

Definition 1. Let A be a random variable with pmf/pdf fx{x). If 
fx(x) can be written in the form of 

f x (x\X) = k{\)e- u ^ 
where U(x, A) is 



U(x,X) = X'u(x) 
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a linear combination of functions u(x) not depending on other param- 
eters, and k(X) is normalizing factor, then it has simple exponential 
form . u(x) is called simple potential. 

If the pmf/pdf can be written in the form of 

fx(x\X,a) = k{X,a)e- u{x ' X ' a) 

where U(x, A, a) is 

U(x, A, a) = X u(x, a) 

a linear combination of functions u(x, a) depending on other parame- 
ters a, and k(X, a) is normalizing factor, then it has general exponential 
form . u(x, a) is called general potential . 
The U(-) function is called total potential. 

Note. Any class of pmf/pdf which can be written in the exponential 
form is equivalently characterized by its exponential form pmf/pdf or 
by its potentials. 

Example 1. T(a, (3) distribution has simple exponential form, with 
total potential U(x,X) = X\x + A 2 lna;; Ai = 4 and A2 = 1 — a; 
ui(x) = x and U2(x) = In a; are the potentials. The normalizing factor 
fe(Ai,A 2 ) = L ^-^- 

Logistic (//, (3) distribution has general exponential form with total 
potential U(x,X,a) = XiUi(x,ot) + A 2 u 2 (£,a:), with A = [^-,2], and 

x — a-\^ 

the potentials ui(-) = ^r^, w 2 (-) = m (l + e "2 ) ; an d a = [//,/?]. 
k(a 2 ) = l/a 2 . 

Discrete normal distribution dn(X, a), defined over a support by 



fx(xi\X) 



e -\(xi-a)' 2 
^.g-A^j-o) 2 



has total potential U(x,X,a) = X(x — a) 2 . It can be equivalently ex- 
pressed in simple form with U{x, Ai,A2) = X\x + A 2 x 2 , where Ai = 
— 2aA and A 2 = A. 

Standard definitions of moment and sample mean are extended. 

Definition 2. ^-moment of random variable X, (J>(V), is for any func- 
tion V(X, a) defined as 

fi(V) =EV(X,a) 
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Definition 3. Sample V-moment of random variable X, m(V), is for 
any function V(X, a) defined as 



m(V)=Y j r i V{X i ,oc) 

i=l 
where rj is frequency of i-th. element of support in sample. 

Definition 4- Let /u(V), m(V) are ^-moment and sample ^/-moment, 
respectively. Then requirement of their equality 

H(V) = m(V) 

will be called V-moment consistency condition. 

Notation. A, u(-), /Lt(-) and m(-) are [J, 1] vectors, indexed by j. x, 
p and r are [m, 1] vectors, indexed by i, with m finite or infinite, a is 
[T, 1] vector indexed by t. 

Since entropy maximization can be reasonably constrained by con- 
straints other than the moment consistency constraints (see for instance 
(Golan, Judge and Miller, 1996), (Mittelhammer et al., 2000), (Golan, 
Judge and Perloff, 1996) or proceedings of MaxEnt conferences), in 
order to be specific, we will speak about an ME task. Also, ML task 
is defined. The complementarity results obtained for the ME task eas- 
ily extends to the more general constraints used with the Shannon's 
entropy maximization criterion. 

Definition 5. ML task on fx(x\0). Let X±,X2, ■ ■ ■ ,X n be a random 
sample from population fx(x\0). The maximum likelihood task on 
fx(x\9) is to find maximum likelihood estimator of G, given the 
sample. 

Definition 6. ME task on u(-). Given a sample and a vector of known 
potential functions u(-), the maximum entropy task is to find the most 
entropic distribution p consistent with the set of u-moment consistency 
conditions. 
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3. ML TASK AND ME TASK 

3.1. Simple exponential form, simple potential case 

THEOREM 1. Complementarity of ML and ME tasks, identity of 
solutions 

Let Xi,X 2 , . . . , X n be a random sample. Then, 

i) complementarity of tasks 

a) ML estimator A of A on simple exponential form fx(x\X) = 
/c(A)e u is obtained as a solution of system of J Uj -moment consis- 
tency conditions, 

b) the most entropic distribution p satisfying the system of J Uj- 
moment consistency conditions is the simple exponential form pmf/pdf 
fx(x\\). 

ii) identity of solutions 

necessary and sufficient conditions for ML task on simple exponen- 
tial form pmf/pdf fx{x\\) = k(\)e~ ( x ' and ME task on the simple 
potentials u(x) are identical, and they are 

n(uj)=m{uj) i = 1,2, ...,J 

Proof. 

Discrete r.v. case. 
1. ML task. 

J m 

max /(A) = ln(/c(A)) — > > XjrjUj(xj) 

X i=H=l 

leads to system of J first order conditions (FOC) 

fi(uj)=m(uj) j = 1,2,..., J 

The corresponding hessian matrix of second derivatives of loglikeli- 
hood function with respect to (wrt) A is 



H 



ML 



( Var(ui) Cov(ui,U2) ■■■ Cov (ui,uj)\ 

Cov(u 2 ,ui) Var(-u 2 ) ... Cov(u 2 ,uj) 

\Cov(uj,ui) Cov (uj,u 2 ) ... Var(uj) J 

negative definite, assuring that unique global maximum was attained. 
Thus, ML task on simple exponential form of pmf is identical with 
solving a system of J non-linear equations, the u^-moment consistency 
conditions. 
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2. ME task. 

m 

max H(p) = — } pilnpi 
p f-f 

subject to 
H(uj) = m(uj) j = 1,2,..., J 

which can be accomplished by means of Lagrangean 

m J 

i=l i=l 

leading to system of m FOC 

p i = e -A'u(*0 i = l,2,...,m 

which, after a normalization gives the simple exponential form as the 
solution. 

The corresponding hessian matrix of second derivatives of the La- 
grangean wrt p is 



H 



ME 



f-l/pi o ... \ 

-l/p 2 ... 

\ ... -1/ Pm ) 



for pi > negative definite, satisfying the sufficient conditions for a 
unique global maximum. 

Thus, ME task with the system of J Uj-moment consistency con- 
ditions leads to the simple exponential form, where A, the Lagrange 
multipliers, have to be found out of the system of nonlinear equations 

Continuous r.v. case. 

1. ML task - in analogy with the discrete case proof. 

2. ME task. 



max H(fx(x))=- / fx{x)\a(f x {x))dx 
fx(x) J 

subject to 

fi(u j ) = m(u j ) j = 1,2,..., J 

which can be accomplished by means of Lagrangean functional 

J 
L(fx(x)) = -fx(x)ln(f x (x)) +X> J -u i (s)/x(s) 
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leading to Euler's equation (FOC) 

dL(- 







dfx(x) 
which, after a normalization gives the simple exponential form 

g— A'u(rr) 



fx(x\X) 



J e~ x ' u W dx 



Note. ML task on simple exponential form and ME task on simple 
potentials are complementary in the sense, that where one starts the 
other one ends, and vice versa. ML starts with known simple exponen- 
tial form of pmf/pdf and ends up with ML estimators of the parameters, 
found out of the potential moment consistency equations. ME, working 
on the sample, starts with assumed form of potential functions, forming 
potential moment consistency constraints. The most entropic distribu- 
tion resolved is just the exponential form pmf/pdf ML has assumed. 
And the ME estimators of its parameters are the same as the ML 
estimators. We say that ML task on simple exponential form pmf/pdf 
and ME task on simple potentials are complementary. 

ML and ME tasks are complementary in set-up but identical in 
solution. Both the tasks end up with the same mathematical prob- 
lem of solving estimators of A out of the system of potential moment 
consistency equations (1). 



Example 2. Let X\ , X2 , . . . , X n be a random sample of size n from 
discrete normal distribution dn(\\, A2), taken in the simple exponential 
form. 

ML task of estimation leads to solving Ai , A2 out of system of equa- 
tions 



Em 
i=l x i e 



-(X 1 x l +\ 2 x^) 



V m e -(Mxi+\ 2 xf) 
Z_/i=l 

V m f>-(AiSi+A 2 x?) 



Em 
i=l r i x i 

£1=1 nzj 



which is just the system of x-moment and x 2 -moment consistency 
conditions. 
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ME task constrained by system of x- moment, and x 2 -moment con- 
sistency conditions 



m m 

i=l i=l 



_ (2) 



nxf 
i=i i=i 



finds the most entropic distribution consistent with the constraints to 
have form (after normalization) 

e -(AiXi+A 2 a:2) 

where, Ai, A2 should be found out of the system (2), after plugging (3) 
in. 

In passing we mention an identity of ML and modified method 
of moments (MMM) in the case of exponential family, discovered by 
(Huzurbazar, 1949) and explored further by (Davidson and Solomon, 
1974). The identity holds also for the simple exponential form, mak- 
ing ME complementary to both ML and MMM. Note that MMM 
starts with a moment consistency conditions, where understanding of 
moments is enhanced as done here by Definitions 2, 3, 4. 

3.2. General exponential form, general potential case 

Complementarity of the general exponential form ML task and general 
potential ME task can not be assessed analytically in full extent, for 
sufficient conditions for maximum of likelihood or entropy function do 
not allow, in general, for it. We show, analytically, that ML task on the 
general exponential form and ME task on the general potentials lead 
to the same FOC's. This could be called 'weak complementarity'. 

THEOREM 2. Let X\,X<i, ■ ■ ■ ,X n be a random sample. Then, neces- 
sary conditions for 

a) ML task on general exponential form pmf/pdf fx(x\X, a) 
= k( y \,a)e~ x ' u(x ' a ') 

b) ME task on the general potentials u(x, a) 
are identical, and they are 

H(uj)=m{uj) j = 1,2,..., J 
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Proof. 

Discrete r.v. case. 

1. ML task. 



max l(\,a) = ln(k(X,a)) — \ N XjriUj(xi,a) 

j=i i=i 



leads to system of J + T first order conditions 

H(uj)=m(uj) j = l,2,...,J 

2. ME task. 

m 

max #(p(a)) = -S^pikipi 
P(«) t=i 

subject to 
fi( Uj )=m(uj) j = 1,2,..., J 
which can be accomplished by means of Lagrangean 

TO J 

L(p(a)) = - Y^Pi ln Pi + X] A i( m ( u j) ~ K u j)) 
i=\ j=i 

leading to system of m + T FOC's 

Pi = e -*'»(*i.«) z = 1,2, 






j=l \ 1=1 v J / 

The most entropic distribution after normalization takes general expo- 
nential form 



e — A'u(xi,cu) 

™ g-A'u^i: 

where 'ME estimators' of A have to be found out of the system of (5). 



Pi= ^m p - X -u(x„a) * = l,2,...,m 
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The T of equations of the system (5) simplifies heavily into 

which are the same as the T equations of FOC's for ML task (4). 

Thus, the ME and ML tasks indeed lead to the same necessary 
conditions (4). 

Continuous r.v. case. 

In analogy to the proof of Theorem 1. 

COROLLARY 1. Due to the linearity ofU(x,\,a) in X, the necessary 
conditions (4) can be rewritten in a compact form 

Example 3. Let X±,X2, . . . ,X n be a random sample from discrete 
normal distribution dn(X, a), taken in the general exponential form, so 
u(x,a) = (x — a) 2 . 

ML task of estimation leads to solving A, a out of the system of 
equations 

[i(u) = m(u) 

du\ ( du\ (6) 



\da t ) \da t 

ME task constrained by moment consistency condition 

m m 

^Piixi -a) 2 = ^r^Xi -a) 2 
leads to the FOC's 

p. = e-ACxi-a) 2 

du \ f du 



\da t ) \da t 

where A, a has to be found out of (6), after normalizing p's. 

So, ML and ME tasks lead to the same necessary conditions. Also, 
note that the ML and ME estimators are the same as in the Example 
2, where dn(-) was taken in the simple exponential form. 
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Regarding the sufficient conditions, following Theorem states the 
second derivatives for the both tasks. Whether they are identical can 
not be in general analytically assessed. 

THEOREM 3. Second derivatives for the ML task are 
d 2 l(X,a) 

d 2 l(X,a) 

d\jd\ L 
d 2 l(X,a) 

da 2 
d 2 l(X,a) 

datda T 

d 2 l 

and for the ME task they are 
d 2 L(p(a)) 1 



Var(U' x .) 

Cov(U' Xj U' Xl ) 

(Var(U^)+m(UZ)-KK)) 
J 



.i 



-\ j cov(u' x .,u'{. at ) - E ^{u' x .u'i kat ) - m{u>;. at ) + w; 



a t ) 



dp 2 p. 



92L{P i a)) = Var(U at )+m(U» t )-»(U» t ) 



da 2 

d 2 L(p(a)) 

datda T 



m« a J ~n(K t a T ) + Cov(U' at ,U aT ) 



Proof. Differentiating twice the loglikelihood function, and the La- 
grange function lead to the stated results. 

In the following simple instance of the general potential the sufficient 
conditions are analytically tractable, showing that at the points chosen 
by the necessary conditions (4) entropy function attains its maximum 
in p(ct), and minimum in a, hence the chosen distribution has minimal 
entropy in the class of the most entropic distributions, consistent with 
the moment consistency constraints. Likelihood function at the points 
attains its maximum. 

Example 4- Find the sufficient conditions for the Example 3 set-up. 
The general total potential is U(x, A, a) = X(x — a) 2 , so the poten- 
tial is u(x,a) = (x — a) 2 . The second derivatives stated in the above 
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Theorem then simplifies into 
d 2 l(\,a) 



d\ 2 

d 2 l(\,a) 

da 2 
d 2 l{\,a) 



dXda 
for the ML task, and into 

d 2 L(p(a)) 1 

dp 2 p, 

d 2 L(p(a)) x2 



Var (u) 

(A 2 Var (u' a ) + A(m(n'^) - /u(u^))) 

(ACov (u, u' a ) + m(u' a ) - n{u' a )) 



da 2 



A Var (u' a ) + A(m(ii") - /i(w")) 



for the ME task. Furthermore, in this case 

m«)-M<) = 
and also, due to the FOC's (4) 

m(u' a ) - n(v! a ) = 
Thus, the second derivatives for the ML task form a hessian matrix 



Var (n) ACov (u, u') 
ACov (u, u') A 2 Var (u 1 ) 



HmL — - I \n^ T (\, ' ,/\ X 2 



which is negative definite, assuring in this case, that the global maxi- 
mum was attained. 

ME task second derivatives are 

d 2 L(p(a)) _ 1 

dp 2 Pi 



d 2 L(p(a)) _ Ax2 



da 2 



4A z Var (x) 



showing that entropy attains its maximum in distribution p, and min- 
imum in a, at the same point where likelihood attains its maximum. 

This result was also supported by numerical investigations, elucidat- 
ing the behavior. In the a suggested by FOC's entropy function attains 
its minimum, whilst the maximum is attained for an a degenerating p 
into an uniform distribution. No surprise, since the value of parameter 
a of u(x, a) is free to choose, and attaining the goal of maximal entropy 
the value is set up such that the uniform distribution is reached. 
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The above analytically tractable case of the sufficient conditions 
and several numerical investigations of more complex general poten- 
tials lead us to propose a hypothesis about complementarity of ML 
and MiniMax Entropy tasks and identity of their solutions, under the 
general exponential form, general potentials. 

For the sake of completeness, the MiniMax Ent task is defined. 

Definition 7. MiniMax Entropy task. Given a sample and a vector 
of known general potentials u(x,a), the MiniMax Entropy task is to 
find in the class of all most entropic distributions p(ex) consistent with 
the set of u-moment consistency conditions, a pmf/pdf with minimal 
entropy. 

Note. If the potentials are simple, MiniMax Ent task reduces into 
the ME task on simple potentials. 



4. CONCLUSIONS 

As a way of concluding we sum up the main points of the presented 
work: 

1) In light of the physical analogy mentioned in the Introduction 
traditional statistical notion of exponential family (see for instance 
(Brown, 1986), (Barndorff-Nielsen, 1978)) appeared to be too restric- 
tive. An extension to general exponential form, driven by the analogy 
was proposed. Also, simple and general potential were introduced in 
the vocabulary of statistics. 

2) Maximum Entropy task, as a typical instance of MaxEnt method 
and Maximum Likelihood task were defined in order to make clear the 
difference in operational mode of the two methods. 

3) Concept of complementarity was introduced and defined (see Note 
2 at the Section 3.1). Maximum Entropy task on simple potential and 
Maximum Likelihood task on simple exponential form were proved to 
be complementary. 

4) Exploration of the complementarity of MaxEnt on general poten- 
tial and ML on general exponential form (Sect. 3.2) led to a generaliza- 
tion of MaxEnt into MiniMax Ent. It was proved that MiniMaxEnt on 
general potential and ML on general exponential form lead to the same 
necessary conditions. Whether the conditions are also sufficient can not 
be in general analytically assessed. Simple instance of general potential 
(Example 4) as well as several numerical investigations suggests that it 
is the case and full extent complementarity of MiniMaxEnt on general 
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(parametric) potential and ML on general exponential form can be 
claimed. 

5) Finally, we would like to note that the complemantary rela- 
tionship of MiniMaxEnt/MaxEnt task to the ML task seems to be 
specific property of Shannon's entropy criterion. In (Grendar and Gren- 
dar, 2000) it was shown, that so-called maximum empirical likelihood 
(MEL) criterion constrained by moment consistency constraints, pro- 
posed by (Mittelhammer et al., 2000) in the context of noiseless linear 
inverse problem, is not complementary with ML on the MEL recovered 
class of pmf/pdf. 
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